Vwdre – a Vision-based Approach for Mining Data from Search Engine Result Pages
نویسنده
چکیده
The data extraction from the dynamically generated web pages is a challenging factor because the result of the search engines are always different for every query submitted. Many techniques were proposed to address this issue but most of them have the common problem of language-dependency. In order to overcome the limitations of previous works, there are few ways which analyze visual features of the web page. In this paper, we proposed a new vision-based approach which is independent of the code used. It broadly utilizes the visual features on the search engine result pages to locate the data region so asto mine the data records from it. We develop a clustering by similarity algorithm to check the similarity of data records. Also, we propose a technique to generate the wrapper for data record extraction by examining the multiple result pages from the same search engine.
منابع مشابه
Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملA Framework for Prefetching Relevant Web Pages using Predictive Prefetching Engine (PPE)
This paper presents a framework for increasing the relevancy of the web pages retrieved by the search engine. The approach introduces a Predictive Prefetching Engine (PPE) which makes use of various data mining algorithms on the log maintained by the search engine. The underlying premise of the approach is that in the case of cluster accesses, the next pages requested by users of the Web server...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملExpert Discovery: A web mining approach
Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...
متن کاملA New Hybrid Method for Web Pages Ranking in Search Engines
There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...
متن کامل